Apparatus for automatically generating source code

ABSTRACT

A method of automatically generating software from one or more predefined functions in accordance with an input statement entered in natural language, the method comprising the steps of:  
     (i) analysing the input statement for its semantic content, so as to extract first semantically meaningful elements from the input statement;  
     (ii) analysing the one or more predefined functions for their semantic content, so as to extract one or more sets of second semantically meaningful elements from the one or more predefined functions;  
     (iii) identifying at least one of a condition, an action and/or a statement in the input statement;  
     (iv) comparing the first semantically meaningful elements with the second semantically meaningful elements so as to identify one or more predefined functions that correspond to one or more action and/or statement of the input statement;  
     (v) combining at least some of the first semantic elements in accordance with any conditions identified at step (iii) so as to generate corresponding condition variables;  
     (vi) combining functions and condition variables identified at steps (iv) and (v) according to a set of predetermined rules in order to generate the software.

[0001] The present invention relates to apparatus for automaticallygenerating source code, and is particularly, but not exclusively,suitable for generating source code for communication services.

[0002] Traditionally, software development has comprised severalidentifiable processes: requirements capture, where customerrequirements are broken down into fundamental descriptions that can beused to create specifications; design of software elements to thesespecifications; implementation of the software elements to create asoftware deliverable; and maintenance of the software deliverable. Inmany cases, the customer requirements further include developinghardware, which will be integrated with the software deliverable. All ofthese processes are time consuming and costly in their preparation, andoften there are integration and implementation problems. In order tocorrect these problems, some re-design may be required, which oftendelays the down-streaming of the deliverables and adds significantly tothe costs.

[0003] Several groups have focussed on identifying areas in thedevelopment process that could be pruned to offer time and cost savings,noting in particular that around sixty to seventy percent of a system'sfunctionality duplicates that of other systems. There is thussignificant interest in developing tools that generate softwareautomatically, as this offers reductions in software design stage costs.AT&T have disclosed, in “Object Magazine 5, 1995”, a tool that cangenerate object-oriented code from graphical models. However, ideallyautomatic code generators should be adaptable to different platforms,different standards, and different languages, and not be restricted togenerating object oriented code.

[0004] Automating the validation of code could also offer significantcost savings, as identified by the British Aerospace Dependable ComputerSystem Centre in York, in “Qualification of automatic code generationusing formal techniques”¹. The paper presents an animation facility tovalidate the code, which embeds formal methods to perform the validationitself.

[0005] There are several quasi-automatic code generators, such as the“wizards” developed by the Microsoft™ Corporation; these create basicclass template syntax, leaving the programmer to insert the code that isspecific to the application under development. However, these arelanguage specific, are limited to producing code templates, and requirethe user have a working knowledge of the language itself. Anotherquasi-automatic method of code generation includes “forms”, where a userfills in fields comprising the form. However, the entries must adhere toa specific format, and the functionality of the code that is generatedis extremely limited.

[0006] Methods of using graphical methods to generate code are also wellknown. For example, the JBuilder™ product from Borland incorporates aGUI designer by which the software developer can use a visual tool todraw the required user interface elements. The system then producesappropriate Java source code to handle these elements automatically, andallows the developer to merge this with conventionally-written code.Other systems such as Rational Rose™ and Oracle Designer™ allow thedeveloper to express the program logic using graphical symbols, and thengenerate code automatically. In all of these cases the user must have aknowledge of the graphical notation used, which may be Unified MarkupLanguage (UML) or some other convention. In addition, the user must havea good understanding of the programming language used in order that heor she can fill in certain parts of the template code produced, and alsointerface the automatically generated code with other parts of thesoftware application. This restricts the usefulness of this type ofsystem to experienced software programmers.

[0007] There are many situations where it is desirable for anon-programmer to be able to program a system so that it cansubsequently act on his or her behalf without further interaction. Atelephone answering machine is a simple example of such a system; theuser implicitly instructs the device to answer the telephone call and torecord a message in his or her absence. Another well-known example isthe video recorder, which may be set to record a programme when the useris out or fast asleep. However, it is well know that many people havedifficulty even with the relatively simple task of programming a videorecorder. In addition, even experienced programmers make errors,particularly when dealing with complex logic, and the process of testingthat the program behaves as required (debugging) is a well establishedpart of the software development process.

[0008] As e-commerce continues to develop, examples of systems to whicha user delegates some of his or her authority, will become morewidespread. A recent example is that of a proxy, used in on-lineauctions. The user can instruct his or her proxy to bid up to a certainamount for a particular item. Future systems may allow much more complexnegotiations to be carried out in real time, following the instructionslaid down previously by the human user. If these systems are to be usedand trusted, it is essential that users without programming experiencecan program them effectively and have confidence that the system willsubsequently exhibit the appropriate behaviour. Preferably this shouldnot require the user to learn a programming language or a particulargraphical notation.

[0009] According to one aspect of the present invention there isprovided a method of automatically generating software from one or morepredefined functions in accordance with an input statement entered innatural language, the method comprising the steps of:

[0010] (i) analysing the input statement for its semantic content, so asto extract first semantically meaningful elements from the inputstatement;

[0011] (ii) analysing the one or more predefined functions for theirsemantic content, so as to extract one or more sets of secondsemantically meaningful elements from the one or more predefinedfunctions;

[0012] (iii) identifying at least one of a condition, an action and/or astatement in the input statement;

[0013] (iv) comparing the first semantically meaningful elements withthe second semantically meaningful elements so as to identify one ormore predefined functions that correspond to one or more action and/orstatement of the input statement;

[0014] (v) combining at least some of the first semantic elements inaccordance with any conditions identified at step (iii) so as togenerate corresponding condition variables;

[0015] (vi) combining functions and condition variables identified atsteps (iv) and (v) according to a set of predetermined rules in order togenerate the software.

[0016] Embodiments of the invention will now be illustrated, by way ofexample only, with reference to the accompanying drawings, in which:

[0017]FIG. 1a is a schematic diagram showing apparatus for automaticallygenerating source code according to an embodiment of the presentinvention;

[0018]FIG. 1b is a schematic diagram showing apparatus for automaticallygenerating source code according to a further embodiment of the presentinvention;

[0019]FIG. 2 is a schematic diagram showing data storage providing partof the apparatus of the embodiments of either FIG. 1a or FIG. 1b;

[0020]FIG. 3a is a schematic diagram showing analysing means providingpart of the apparatus of the embodiments of either FIG. 1a or FIG. 1b;

[0021]FIG. 3b is a schematic diagram showing the relationship between afunctional definition located in the data storage of FIG. 2 and itsfunctional capability;

[0022]FIG. 4 is a schematic block diagram showing the steps involved inanalysing a function description for its semantic content;

[0023]FIG. 5 is a schematic block diagram showing the steps involved inanalysing an input statement for its semantic patterns, logicalstructure, semantic content and for matching semantic content betweenthe function descriptions and the input statement;

[0024]FIG. 6a is a schematic diagram showing categorisation of phrases;

[0025]FIG. 6b is a schematic diagram showing analysis of semanticcontent of phrases;

[0026]FIG. 7 is a block diagram showing a terminal utilised in a secondembodiment of the invention and corresponding to that shown in FIG. 1;

[0027]FIG. 8 is a block diagram showing an arrangement of lexicalcomponents according to a third embodiment;

[0028]FIG. 9 is a block diagram showing in greater detail the componentscomprising the client terminal shown in FIGS. 1a and 1 b;

[0029]FIG. 10 is a block diagram showing in greater detail the processespresent in the client terminal shown in FIGS. 1a and 1 b;

[0030]FIG. 11 is a block diagram showing in greater detail thecomponents comprising the server shown in FIGS. 1a and 1 b;

[0031]FIG. 12 is a block diagram showing a possible implementationconfiguration for software automatically generated using the apparatusshown in FIGS. 1a and 1 b.

[0032] Various phrases are used in the following description, and in thecontext of the present invention these are defined as follows:

[0033] “Semantically meaningful elements” are elements found in naturallanguage and may be defined with reference to the following example:“The cat sat on a mat”:

[0034] a) meaningful semantic entities, typically denoted by nouns. Forexample the semantic entities are “cat”, “and “mat”.

[0035] b) the form of each of the entities (e.g. whether it is singularor plural), and whether it is in the definite or indefinite form. In theexample, “the cat” is singular, and “the” indicates that it is thedefinite article. “Mat” is singular, and “a” indicates that it is in theindefinite form.

[0036] c) “States of affairs”—generally indicated by verbs. States ofaffairs indicate either actions, as most verbs do, or states of being(e.g. the verb “to be”). In this example, “sat” is a state of affairs.

[0037] d) The conditions attached to each state of affairs (e.g. thetense of the verb concerned)

[0038] e) Modifiers (e.g. adverbs or adjectives) which ascribeproperties or otherwise modify an entity or state of affairs.

[0039] f) The linkages between the occurrences of the foregoing (e.g.which entities a state of affairs affects and how; and which entities orstate of affairs a modifier modifies).

[0040] “semantic content”: a collection of semantically meaningfulelements (as defined above) comprising a phrase;

[0041] “semantic pattern”: semantic identifiers that represent arelationship between semantically meaningful elements;

[0042] “logical structure”: logical flow of information, in the form ofBoolean operators (and, or, if, then) and conditions and actions thatare dependent on the operators;

[0043] “input statement”: phrase or sentence entered by a user foranalysis of its semantic content;

[0044] “function”: a named part of a computer program that can beinvoked from other parts of a program as needed;

[0045] “function definition”: the name of a function, the type of thevalue it returns (if any), and the types and names of its arguments (ifany) and the code comprising the function. The form of a functiondefinition is language-specific, in terms of variable types, variabledeclaration, and code syntax;

[0046] “function description”: natural language phrase or sentencedescribing the functional capability of a function.

[0047] Further, in the following description, a “user” is notnecessarily limited to a human entity, as it might well be for instanceanother piece of equipment or a software agent.

[0048] Overview

[0049] Broadly, in a telecommunications environment for instance, usingan embodiment of the present invention, a user can input a naturallanguage instruction (an “input statement”) and it will be analysed andused to put together source code customised for carrying out the user'swishes in that environment. For instance, the user might make the inputstatement “Please divert my phone to Frank's”. That statement will beanalysed and used to generate source code that causes call divert onincoming calls for that user to the telephone number of the namedperson. The source code so generated will need to be appropriate to thecurrent telecommunications environment for the user, for instance interms of operating systems, database query languages and transmissionprotocols, but the user needs no technical knowledge of thatenvironment. Further, embodiments of the present invention can betransferred to a different technical environment for the same userrelatively easily.

[0050] With reference to FIG. 1 of the accompanying drawings, apparatus100 for analysing input statements and automatically generatingcustomised source code according to the present invention may generallybe referred to as a ‘software generator’ 100. The generator 100 is showndivided into the following functional parts:

[0051] DATA ANALYSER 102

[0052] CODE GENERATOR 103

[0053] COMPUTER 105

[0054] DATA STORAGE 106

[0055] The COMPUTER 105 can either be a standalone machine, as shown inFIG. 1a, or a server computer that receives input from a client terminal101 via a communications network 108, as shown in FIG. 1b. The softwaregenerator 100 may be built into a telephone, or a mobile phone usingspeech recognition and synthesis for input and output respectively.

[0056] When the computer is a server, as shown in FIG. 1b, the computer105 may be additionally connected to external data stores 112 via acommunications network such as the Internet 110. Information about theInternet can be found, for example, from The World Wide Web Handbookfrom International Thomson Computer Press, ISBN: 1-850-32205-8. Theterms “client” and “server” are illustrative but not limiting to anyparticular architecture.

[0057] The DATA STORAGE 106 functional part of the apparatus is locatedon the computer 105 and includes one or more data stores comprisingpredefined functions, referred to as function definitions. Thesefunction definitions are used as described later in the generation ofthe software code, and include the code comprising a function, the nameof each function and any arguments it takes and their types. Inaddition, each predefined function has an associated natural languagedescription, referred to as a function description, from which the dataanalyser 102 can extract the functional capability of correspondingpredefined functions. This process is explained in detail with referenceto the embodiment below. The data storage 106 also includes linguisticstores comprising multilingual lexicons and linguistic, semantic andsyntactic information. The data analyser 102 accesses the linguisticstores in order to resolve the meanings of input statements and functiondescriptions.

[0058] The DATA ANALYSER 102 functional part of the apparatus is locatedon the computer 105 and includes analysing means and comparing means.The data analyser 102 is used for analysing and resolving the meaning offunction descriptions that are stored in the data storage 1 06 and ofinput statements, so as to identify the functional requirements of inputstatements and relate them to the functional capability of predefinedfunctions. In descriptive terms, the data analyser 102 matches thesefunctional requirements with predefined functions that have beendetermined to have the functional capability of the functionalrequirement. In mechanistic terms, the data analyser 102 determines thesemantic content of the input statement and compares the input statementsemantic content with the semantic content of a plurality of functiondescriptions (which have been similarly analysed for their semanticcontent). If there is a match between the semantics of the inputstatement and one of the function definitions then that function isconsidered to meet the functional requirement of the input statement.

[0059] The CODE GENERATOR 103 functional part of the apparatus islocated on the computer 105 and is used for generating source code fromwhichever predefined functions have been identified by the data analyser102.

[0060] Brief Overview of Operation

[0061] A user enters an input statement, which comprises any naturallanguage input such as a statement, conditions and actions, and whichdescribes a desired functionality of code to be generated by thegenerator 100, to the client terminal 101. The user may also specify,using a standard file management browser (e.g. Windows Explorer™), aclass or folder in which function descriptions, which relate to theinput statement functionality, are located. The input statement ispassed to the data analyser 102 for semantic analysis in order toextract the functional requirement of the input statement. The dataanalyser 102 then retrieves whichever function descriptions are storedat the specified location from the data storage 106 and analyses thefunction descriptions for their semantic content in order to determinethe functional capability of the corresponding functions. The dataanalyser 102 checks, as is described in detail below, by comparing thesemantics of the input statement against the semantics of the functiondescriptions, that there is a function that meets the functionalrequirement of the input statement. Assuming that there is a suitablefunction, the data analyser 102 retrieves the corresponding codecomprising the function from the data storage 106. The semantic analysisperformed on the input statement by the data analyser 102 alsoidentifies conjunctions from the input statement, and these, togetherwith the retrieved code are passed to the code generator 103. The codegenerator 103 translates the conjunctions into logical operators, andinserts said operators, together with the retrieved code into apredetermined template, according to a set of predetermined rules,thereby creating a processable computer program. Having assembled thesecomponents to form the program, the code generator 103 loads the programonto a terminal, which could be a network device such as a router, atelephone, a server computer or a client computer, for subsequentprocessing.

[0062] The present invention thus:

[0063] can make use of existing functions and/or methods;

[0064] can use functions in any language and generate code in anylanguage;

[0065] does not require the user to be computer code literate;

[0066] allows the user to enter requirements using naturallanguage—input is not required in a standard format;

[0067] enables validation of the software generated via input statementsin the form of questions.

[0068] First Embodiment of the Invention: Resolution of Input Statementsinto Source Code

[0069] An embodiment of the present invention is operable toautomatically resolve input statements into source code, provided thesoftware generator 100 has access to source code that it can identify asproviding the functional requirements of the input statements. Inparticular, the present embodiment concerns generation of software tohandle telephone calls. The software to be generated will implementrequested behaviour by running a number of predefined functions whichcarry out the lower-level actions of ringing a telephone, forwarding acall etc.

[0070] In practice, it will be understood that the generated softwarecould either comprise the actual code providing a set of functions toimplement requested behaviour, or it could comprise a set of calls oncode which is actually located elsewhere in a network. In the lattercase, the generated software thus triggers calls to this code.

[0071] The embodiment is described below in more detail, in thefollowing order: firstly analysis of one or more predefined functions,secondly analysis of an input statement, and thirdly subsequentcombining of these analyses.

[0072] Analysis of One or More Predefined Functions

[0073] Function Descriptions

[0074] Referring to FIG. 2, information about the functions that areaccessible to the generator 100 is stored in data storage 106,specifically in a code database 200, which comprises predefined functiondefinitions. As described briefly above, the predefined functions areaccompanied by function descriptions, which, in essence, describe thefunctional capability of the predefined function. The functiondescriptions may be given by a separate description that accompanies thefunctions. In accordance with the conventions used in Java, it will beunderstood that a method is used to denote a function, and the termsmethod and function may be used interchangeably when describing thepresent embodiment. The function description will normally be written bya software developer who has an understanding of the operation of theclasses, objects and methods. Descriptions may follow the establishedconventions for Java documentation (for example, see The Design ofDistributed Hyperlinked Programming Documentation (IWHD '95)—A paper onthe design of javadoc, the Java Software tool for generating web-basedAPI documentation. Presented at the International Workshop on HypermediaDesign '95), but may also need to take account of the requirements ofthe data analyser 102, as described later.

[0075] The code database 200 may comprise one or more data files 201containing predefined function definitions, libraries 203 of predefinedfunction definitions and/or links 205 a to remote stores 112 wherepredefined function definitions 205 b are located. In the presentembodiment, the predefined function definitions are written in Java, butthe database may be populated with definitions and descriptions forfunctions written in any programming language.

[0076] The data analyser 102 identifies the functional capabilities ofthe predefined functions by analysing the semantic content of thefunction descriptions. As shown in FIG. 3b, function descriptions mayconveniently be written as comments 311 in the Java source code. Forexample, a well known feature of the Java programming language is thatinformation included between the “/** . . . */” symbols are comments.When these comments are provided as a precursor to a method or classdeclaration in the Java source, they may be put together with a functiondefinition 310 and translated into a documentation file by running aspecial program called “javadoc”. Javadoc is the tool from SunMicrosystems for generating API documentation in HTML format from doccomments in source code (further information is available from theHypermedia paper referenced above).

[0077] For example: COMMENT 311: /** * Causes a phone to forward anincoming call to a nominated person. * This function requires theextension number that you want to forward your calls to */ FUNCTIONDEFINITION 310: public void forwardCall (String phone_number){ .... ...}

[0078] is compiled by javadoc into:

[0079] Documentation File

[0080] forwardCall(String)

[0081] Causes a phone to forward an incoming call to a nominated person.

[0082] Thus the function description 313 for this function forwardCallis “Causes a phone to forward an incoming call to a nominatedperson”.The function description 313 also details input parameters thatare required for the function to operate; it is convenient to split thefunction description into a utility description 315 “A function whichcauses a phone to forward an incoming call to X”, and an input parameter317 “nominated person”. It is understood that splitting the functiondescription, as presented in FIG. 3b, into utility description and inputparameter is inessential to the invention.

[0083]FIG. 3a shows the analysing means comprising a query analyser 301which has access to a linguistic store 303 located in the data storage106. The linguistic store 303 is used to find synonyms, along withsemantically equivalent forms of the various types of inputs received bythe query analyser 301, and contains representations of pragmaticknowledge needed by the semantics module (for example that “dial Mary”is a shorthand form, which should be treated more fully as “dial Mary'stelephone number”). The steps involved in analysing the functiondescription for its semantic content are shown in FIG. 4:

[0084] S4.1 Query analyser 301 analyses the function description 313 inorder to extract a utility description 315 (A function which causes aphone to forward an incoming call to a nominated person) and an inputparameter description 317 (String) (described above with reference toFIG. 3b);

[0085] S4.2 Query analyser 301 analyses the utility description 315 forits semantic content. As is known in the art, natural language parsersperform semantic analysis, and the general operation of such parsers iswell known. The specific parser utilised in the present inventionanalyses the utility description 315 in the following manner:

[0086] The utility description 315 is broken up into characters;

[0087] Any white spaces are found and these are used to determinelocations of the word boundaries;

[0088] The characters are then put back together to form the respectivewords;

[0089] The words are then all converted to lower case;

[0090] This is then stored as a list;

[0091] The list is analysed to determine what sort of sentence it is(declarative, imperative, Yes/No question, which question, If/thencondition etc.) (FIG. 6a, described below);

[0092] Each word on the list is analysed for its semantics and itsrelationship with the rest of the words, and this generates a list ofsemantics (FIG. 6b, described below).

[0093] Furthermore, the base form of the operative verb, which for theabove example is “forward”, is analysed for synonyms, along withsemantically equivalent forms using derivational morphology in alinguistic store 303, giving a list of properties such as send, deliver,give etc. The other properties of the subject, which is the telephone,are extracted in a similar way, such that “incoming call(s)” and“nominated person” are assigned semantic meanings and alternatives;

[0094] S4.3 Query analyser 301 analyses the input parameter description317 in order to understand the number, and type, of parameters, orarguments, required by the function (following identical procedure tothat described above with reference to S4.2).

[0095] In the present embodiment, the query analyser 301 generatesProlog facts to represent the semantically meaningful elements generatedat steps S4.2 and S4.3, and these facts are stored locally, for examplein a temporary database or in memory. The semantics of the function arestored in the form:

method(forward_Call,sem([forward(_(—)1,forward:v:_),r(_(—)2,patient,_(—)1,_(—)3),e(_(—)3,call:n:_)]),param([“the extension number that you want toforward calls to])),  (Expr. 1)

[0096] which means that the function name, function semantics andarguments required by the function are stored. The Prolog mechanismsinvolved are explained in introductory textbooks on the language, forexample Clocksin and Mellish, “Programming in Prolog”, Springer-Verlag,1987.

[0097] Some of the function descriptions may have been pre-processed fortheir semantic meaning by the query analyser 301, or may be processedconcurrent with submission of an input statement by the client 101. Inthe latter case the user may be required to specify the class, orclasses, in which coded functions corresponding to potential inputstatements are located, and the results from the analysis would bewritten to a temporary file for subsequent access. For the purposes ofthe present description, where the predefined methods are written inJava, it is assumed that the method descriptions are analysed for theirsemantic content in parallel with analysis of input statements.

[0098] Analysis of an Input Statement

[0099] When the generator 100 is loaded on the server computer 105,input statements, which are entered in natural language, are submittedto the generator via a browser 104. The statements may comprisecondition/action information or factual information, such as:

[0100] Condition/Action:

[0101] (i) “If I am in a meeting and the caller is from outside BT, thenyou should take a message”, or

[0102] (ii) “Calls should be forwarded If I am in a meeting and the callis urgent”

[0103] Factual:

[0104] (i) “My mobile phone number is 07970 111111”

[0105] (ii) “I am in a meeting from 1 pm to 3 pm”

[0106] Thus the input statement may be considered to have certainfunctional requirements—for the Condition/Action example (i) above, therequirement is that: “a message should be taken given the conditionsthat I am in a meeting and the caller is from outside BT”. As describedabove, this functional requirement is a semantic representation of theinput statement and is extracted by analysing means forming part of thedata analyser 102, as described below.

[0107] The input statement is firstly pre-processed to remove whitespaces etc as described above in S4.2, and is then categorised into atype of input statement. The category identifies both the semanticpatterns comprising the input statement and the logical structure of theinput statement. FIGS. 6 shows a semantic tree diagram having aplurality of types of input statements, predetermined paths as afunction of the type of input statement and semantic patterns in thosepaths. The semantic patterns may specify actions, conditions, andstatements, each of which require identifying if the input statement isto be coded by the code generator 103. Then the semantic content of theinput statement is analysed by the query analyser 301, and instantiatedagainst the function descriptions previously generated at step S4.2 inorder to identify function descriptions (thus processable events) thatfulfil the input statement functional requirements. With reference toFIG. 5, this procedure can be identified as having the followingdistinct parts (information relating to each point is expanded below):

[0108] S5.1 Categorise the input statement into declarative, imperative,conditional, Yes/No question, who/what/where/when question etc.

[0109] S5.2 Extract the semantic patterns and logical structure of theinput statement.

[0110] S5.3 Extract the semantic content of the input statement.

[0111] S5.4 Identify the presence or otherwise of predefined functionsthat correspond to the semantic content of the input statement.

[0112] S5.1 Categorise the Input Statement into Declarative, Imperative,Conditional, Yes/No Question, Who/what/where/when Question etc

[0113] As the type of input statement governs the semantic patterns andlogical structure of the input statement, as shown in FIG. 6a, once thetype of input statement has been derived, the query analyser 301 is ableto search for a well defined set of semantic patterns. It is crucial forthe system to be able to identify semantic patterns, as this is used toidentify the occurrence of logic statements and/or factual statements,conditions, actions etc. that will be translated into source code by thecode generator 103.

[0114] S5.2 Extract the Semantic Patterns and Logical Structure of theInput Statement

[0115] Depending on the type of input statement identified at S5.1, theinput statement is analysed for specific semantic patterns. As shown inFIG. 6a, the query analyser 301 branches into a semantic path, once thetype of input statement has been identified, analysing the inputstatement for the semantic patterns listed therein.

[0116] The above input statement example: “If I am in a meeting and thecall is urgent, forward the call to my mobile” is categorised as adeclarative type of input statement. Following the corresponding branch601 in FIG. 6a, the query analyser 301 searches for an instance oflogical structure

IF(X, Y) 603, which represents (if (condition: X 605) then (action: Y607)).

[0117] For this example, the semantic pattern that indicates the IF(X,Y) 603 structure is r(1000,sconj(if),1001,1002), where sconj represents2 phrases joined by a conjunction. The query analyser 301 identifies thepresence or otherwise of this in the input statement by searching for asconj expression. In the present example, the query analyser 301 willalso detect “and”, which indicates a second condition, AND(X1, X2) whereX2 is the second condition: r(1007,sconj(and),1002,1008). Clearly thesemantic patterns that are used to extract the semantic content at S4.2may vary between parsers, and this example is merely illustrative of thegeneral technique. Thus the present example input statement is analysedas having the following structure:

[0118] Sentence (declarative)→If I am in an important meeting and thecall is urgent, you should forward the telephone call to my mobile

[0119] if(X,Y)→If (I am in an important meeting and the telephone callis urgent, you should forward the telephone call to my mobile)

[0120] X→I am in an important meeting and the telephone call is urgent

[0121] and(X1,X2)→And(I am in an important meeting, the telephone callis urgent) X1 → I am in an important ,eeting → Condition 1 X2 → thetelephone call is urgent → Condition 2 Y → you should forward thetelephone call to my mobile  → Action

[0122] S5.3 Extract the Semantic Content of the Input Statement

[0123] The parts of the input statement that correspond to X1, X2, and Yare passed, in turn, for analysis of their semantic content, as shown inFIG. 6b: X1 → I am in an important meeting → Condition 1  Event 608 → Be  Agent relation 610 → ✓    Entity 611 → I   Patient relation 609 → in   Entity 611 → meeting     Modifier relation 613 → value      Modifier614 → important X2 → the telephone call is urgent → Condition 2  Event608 → be   Agent relation 610 → ✓    Entity 611 → call     Modifierrelation 613 → attribute      Modifier 614 → urgent Y → you shouldforward the telephone call to → Action my mobile  Event 608 → forward  Agent relation 610 → ✓    Entity 611 → you   Patient relation 609 → ✓   Entity 611 → call     Adverbial 612 → ✓      Relation 615 → to   Entity 611 → my mobile

[0124] Event 608→forward

[0125]  Agent relation 610→{square root}

[0126] Entity 611→you

[0127]  Patient relation 609→{square root}

[0128] Entity 611→call

[0129] Adverbial 612→{square root}

[0130] Relation 615→to

[0131] Entity 611→my mobile

[0132] When analysing the semantic content of either the functiondescription or the input statement, the respective semantic analyses mayidentify one or more ambiguities in the input statement. If this occurswith the input statement, the query analyser 301 sends a message to theclient 101, asking the user to resolve between possible semanticrepresentations. Furthermore, the query analyser 301 may find that morethan one function description meets the functional requirement of theinput statement; in this situation, the analyser 301 sends a message tothe client 101 asking the user to select one of the functiondescriptions. It should be noted that this may also occur with thefunction descriptions, in which case the description would have beenamended by the software developer at an earlier stage.

[0133] Subsequent Combining of These Analyses

[0134] S5.4 Identify Predefined Functions Corresponding to the SemanticContent of the Input Statement

[0135] S5.4.1

[0136] The comparing means 305 compares the uninstantiated semanticcontent of the input statement with the semantic content of thepredefined function descriptions (re-cap: these identify one or moreprocessable functions) until the input statement semantics match thefunction description semantic content. For the above example, where theinput statement includes the action, Y, “forward a call to my mobile”,the semantics for this are:

event(22,forward:v:_),r(33,patient,22,44),e(44,call:n:_)

[0137] The comparing means 305 will search for

method(Method,sem(event(22,forward:v:_),r(33,patient,22,44),e(44,message:n:_)),param(Param))  (Expr. 2)

[0138] and Method will be instantiated to forward_Call from Expr. 1. Inthis way, the comparing means 305 establishes the presence or otherwise,of a function that is capable of performing the functional requirementsof the input statement. (This example illustrates instantiation of anaction, but in practice, the input statement also may comprise factualstatements and a combination of actions and factual statements).

[0139] S5.4.2

[0140] Any input parameters, or arguments, that are required areinstantiated:

[0141] If there are input parameters to be instantiated, the factscorresponding to the input parameter in Expr. 1, [ ],are assigned toParam in Expr. 2. In some cases, the analysis will not identify a valuefor Param in Expr 2, in which case the comparing means 305 has twocourses of action:

[0142] (i) Find and assign a default value to Param: some of thefunction descriptions include default parameters that may be used asinput parameters in the absence of an input parameter; if this is thecase, the comparing means 305 assigns the default to the inputparameter. Alternatively an initialisation class or function, whichcontains default values, may be specified, and the respective valuesread in by the comparing means 305:

[0143] (ii) If there is no default parameter among the functiondescription, the comparing means 305 sends a message to the client 101,prompting the user to supply an input parameter.

[0144] Once the semantic content of the input statement has beeninstantiated, the identified functions and logical structure aresubmitted to the code generator 103, which, for the example above,inserts source code corresponding to X1, X2 and Y into a template. Thecode generator 103 runs a process that reads in the logic statements,extracts the relationship between them, in the form IF X1 && X2 THEN Y,and inserts the identified functions and statements into a Java templateat predetermined locations.

[0145] This process includes analysing the corresponding semantics toform variables specific to the conditions. Thus for condition X, “I amin a meeting”, the semantics for the parser utilised in the presentembodiment are:

card(1004,sing),ref(1004,1,sing,_11368),event(1002,be:v:_11379),aspect(1002,active&_11389&pres&non_perf&non_prog),r(1003,ptoken,1002,1004),r(1005,pp(in),1002,1006),def(1006,indef),card(1006,sing),e(1006,meeting:n:_11433)

[0146] This enables derivation of:

[0147] the event which governs the condition:event(1002,be:v:_11379)-be;

[0148] the relation between the event and the entity:(r(1005,pp(in),1002,1006))-pp(in);

[0149] the entity associated with the event: e(1006,meeting:n:_11433)-meeting.

[0150] These are then concatenated to form a condition variableevent_relation_entity, which for this example creates variable“be_in_meeting”, and this is inserted into the template at predeterminedlocations, as illustrated below. import java.util.*; import java.lang.*;public class PolicyClass { protected Actions action = new Actions( )protected Boolean call_be_urgent = false; protected Booleanbe_in_meeting = false; /*bold font indicates insertion of inputstatement-specific code*/ public PolicyClass ( ){ } public voidrunPolicy( ){ if ((be_in_meeting = = TRUE) && (call_be_urgent = =TRUE)){ action.forward_call (0770 111 111);  }  } }

[0151] Thus, Boolean variables for conditions X1, X2, (call_be_urgent,be_in_meeting) are declared in the declaration section, and IF X1 && X2THEN Y is coded into the executable part of the code. In this example,the function forward_call is defined in class Actions, which, togetherwith its function description, is stored in the code database 200. Aninstance, action, of the class Actions that contains functionforward_call is created in the class definition. The latter containsinformation such as mobile phone number etc., and is accessed forassigning the input parameter to action Y.

[0152] The setting of flags call_be_urgent, be_in_meeting occurs eitherby explicitly providing conditions as input statements, e.g. “I am in ameeting from 1 pm to 3 pm” (re-calling that from..to are analysed aslogical operators), or by linking the generator 100 to an electronicdiary system, such as Mircrosoft™ Outlook”.

[0153] Other input statements that are subsequently entered, and whichrequire use of functions within the same class Actions (i.e. same partof the code database 200), may be analysed and added to this code. Thecode that is generated for this embodiment—processing of phone calls—isrun each time a phone call is received, and the generator 100 runsthrough each of the conditions in order to retrieve a correspondingfunction. If an input statement, which relates to a different class (sodifferent subject matter etc.), were entered, a new instance would becreated from a fresh template.

[0154] Second Embodiment

[0155] Input of Data

[0156] In earlier embodiments, the data entries are typed into theterminal 101 as text via the keyboard 701 shown in FIG. 7. In thepresent embodiment the terminal 101 is provided with a microphone 703,and the input text is dictated and transliterated by a speech-to-textconversion program, such as ViaVoice™ available from IBM Inc.

[0157] The input speech is reproduced as text in a text input area ofthe screen 705, and in other respects the present embodiment operates asdescribed above.

[0158] It is advantageous to provide the speech recognition at theterminal 101, where it is possible to train on the voice of theindividual user, rather than centrally. Also, since text rather thanaudio is uplinked, the required uplink bandwidth is kept low.Furthermore, speech recognition requires significant computer processingand it is advantageous if this is provided on individual users' machinesrather than on a central server. On the other hand, providing thegeneration centrally avoids the need to store multiple rules databaseslocally at terminals.

[0159] In this embodiment, the terminal 101 may also comprise a text tospeech program arranged to synthesise speech from the text received fromthe computer 700 to provide audio output via a loudspeaker 707.

[0160] If an applet is running on a browser installed on the clientterminal 101 (see below), the applet may also be arranged to generate avisual display to represent the output data. For example, arepresentation of a human face, or an entire human head, animated insynchronism with the output speech as described in our earlierapplication EP-A-225729, or a sign language display comprising ananimated representation of a pair of hands generating sign language (forexample British or American sign language) from a text to sign languageconverter program. This latter embodiment is particularly advantageousfor those with hearing difficulties.

[0161] Third Embodiment

[0162] Multilingual Input Statements

[0163] In the above-described embodiments, the description assumes thatthe input statements are presented in the English language. However, theform of the input statement results in a representation that issubstantially language-independent. The present embodiment utilises thisto handle input statements in multiple languages by providing a meansfor storing semantics in multiple languages, together with a means forlinking similar semantics across languages.

[0164] Briefly, referring to FIG. 8, the data storage 106 includes aplurality of grammar rules databases 801 a, 801 b, . . . andcorresponding expansion databases 803 a, 803 b, . . . Each pair ofdatabases 801, 803 relate to a given language. On submission of theinput statement, the user specifies the language of the text (the sourcelanguage), and the application from which the input statement is entered(see below) accesses the appropriate expansion and grammar rulesdatabases to analyse the text.

[0165] The lexical database 804 in this embodiment comprises a pluralityof language-specific lexicons 805 a, 805 b, . . . , each containing aword list for the language concerned, with each word of the listincluding a pointer to one or more entries in the lexical database 804,which stores entries comprising meaning data for meanings of each word,and a pointer back to each word for which the entry is a meaning.

[0166] Many words in different languages are directly translatable (inthe sense of sharing a common meaning), such that many meaning entriesin the lexical database 804 store pointers to words in each language.Not all words are directly translatable, and where meanings differ, thelexical database 804 includes additional, language-specific definitionswith pointers from words in only those languages in which they occur.

[0167] The above description assumes that the input statements areentered manually. It may also be advantageous to have input statementstranslated either automatically or semi-automatically. Our earlierapplication number PCT 97186887.6, filed on Aug. 8, 1997 (andcorresponding PCT application PCT/GB98/02389 filed on Aug. 7, 1998),discloses a method of language translation that is particularly suitablefor this purpose.

[0168] Implementation

[0169] With reference to FIG. 9 of the accompanying drawings, and asdescribed above in the context of the second embodiment, the clientcomputer 101 comprises a keyboard 701, a VDU 7051, a modem 709, and acomputer 700 comprising a processor, mass storage such as a hard diskdrive, and working storage, such as RAM. For example, a SUN (™) workstation or a Pentium (™) based personal computer may be employed as theclient terminal 101. When the generator is loaded on a standalonecomputer, such as is shown in FIG. 1a, the computer 700 is computer 105.

[0170] Referring to FIG. 10, an operating control program 1010comprising:

[0171] (i) an operating system 1012 (such as Windows™);

[0172] (ii) a browser 1014 (such as Internet Explorer™); and

[0173] (iii) an application 1016 (such as a Java™ applet, or a plainHTML file), which is designed to operate within the browser 1014,

[0174] is stored within the client terminal 101 (e.g. on the hard diskdrive thereof), when the generator is loaded on a networked computer, asis shown in FIG. 1b. The function of the operating system 1012 isconventional and will not be described further. The function of thebrowser 1014 is to interact, in known fashion, with hypertextinformation received from the server 105 via the PSTN 108 and modem 709.The browser 1014 thereby downloads the applet, or plain HTML file 1016,at the beginning of the communications session, as part of a hypertextdocument from the server 105. The function of the HTML file 1016 is toallow the input of information for uploading to the server 105 by theuser, through the browser 1014. When the computer 105 is a standalonecomputer, a suitable GUI software application may be used to interfacewith the generator instead of a browser 1014 and html file 1016.

[0175] When the generator 100 is run on a networked computer, then theserver 105, referring to FIG. 11, comprises a communications port 1102(e.g. a modem); a central processing unit 1104 (e.g. a mainframecomputer) and a mass storage device 1106 (e.g. a hard disk drive or anarray of disk drives). The server 105 comprises an operating programcomprising an operating system such as Unix(™), a server program and anapplication program (not shown). The operating system is conventionaland will not be described further. The function of the server program isto receive requests for hypertext documents from the client terminal 101and to supply hypertext documents in reply. Specifically, the serverprogram initially downloads a document 1016, possibly containing theapplet, to the client terminal 101. The server program is also arrangedto supply data to, and receive data from, the application program, via,for example, a cgi.bin mechanism or Java Remote Method Invocation (RMI)mechanism. The application program receives data (via the serverprogram) from a client terminal 101, performs processing, and may returndata (via the server program) to that client terminal for display.

[0176] In the present embodiment, the user specifies, via the HTML file1016, which Java class should be accessed from the data storage 106.However, as described above, this is inessential to the invention, asthe generator 100 could analyse all of the data contained within thestore 106. Typically, the user specifies a class, or classes, when theuser knows which class, or classes, provides the functional requirementsof the input statements. (When the language function to be accessed iswritten in a non object-oriented language, the user may specify the filecontaining the function(s) as required).

[0177] Once the class has been selected, it is compiled, creating afurther HTML file (not shown). This further HTML file includes a list offunction descriptions corresponding to class member functions that areinput to the data analyser 102 for analysis as described in steps S4.1to S4.3 above.

[0178] As an alternative source of function descriptions, and insituations where a predefined function is not accompanied by a naturallanguage description in the data storage 106, the generator 100 couldadditionally comprise means for extracting a description of thefunctionality of a function. For example, using code translating means,such as is commonly used to translate between the C and Fortranprogramming languages, the functionality associated therewith may beextracted, for instance, into a language-independent form. If the queryanalyser 301 were to interface with such a means, together with a datastore comprising descriptions of language-dependent functions, then thefunctionality could be translated into natural language and be analysedfor its functional capability as described in the above embodiments.

[0179] The embodiments of the present invention concern natural languageinputs, where input statements are syntactically and semanticallyanalysed using a parser. The term natural language is generallyunderstood to mean a system for communicating which uses symbols (thatis, characters) to build words. The entire set of words is thelanguage's vocabulary, and the ways in which the words can bemeaningfully combined is defined by the language's syntax and grammar.The actual meaning of words and combinations of words is defined by thelanguage's semantics. In the limit, this syntax and grammar can beextremely simple, (for example comprising action commands such as“divert call”) and the present invention is operable to accept suchsparse input statements provided a suitable parser is selected for thequery analyser 301.

[0180] Once the generator 100 has produced a working code, the workingcode can be run in a variety of configurations. As stated previously,many parts of a software system tend to overlap with other systems andmany software systems adopt a “three-tier” architecture, as shown inFIG. 12. The user interface tier 1201 and the data storage tier 1205tend to be very similar for many applications, with the middle tier 1203implementing the business logic which determines what the system does.The techniques described above can be used to generate code to implementthe middle tier 1203, thus making the invention applicable to a widerange of software systems.

[0181] The working code can also be run on network devices such asrouters, in order to provide a software tool for effecting changes tolocal network behaviour. For example, routing tables and/or routealgorithm parameters may be changed in this way. In such a case thegenerator 100 may be located on a server computer (alternatively themiddle tier 1203 if the system architecture is a three-tierarchitecture), and configured to operate such that the working codeoutput therefrom is transmitted to network devices at a predeterminedtime. The input statement to, and running of, the generator 100, may beinvoked by a system script, written, for example, in the Perlprogramming language, and the whole process may therefore be automatedby system timers.

[0182] The working code may also effect building of reactive softwareagents, which are essentially computer programs, according to a naturallanguage specification.

[0183] In a preferred embodiment, the invention is used for control ofterminal devices used in a communications system, of which the telephonehas been discussed above as an example. A more complete (though nonlimiting) list would include: telephones, video cameras, 3D displays,personal digital assistants, cellular telephones, satellite telephones,pagers, video phones, facsimiles, payphones, quertyphones, personalcomputers, lap top portable computers, engineering workstations, audiomicrophones, video conference suites, telemetry equipment.

[0184] In addition to these communications terminal devices, andprovided there is access to the required function definitions, thegenerator 100 can be similarly implemented in a range of householddevices, such as lighting devices, washing machine, television, videoetc. where the selection of control parameters is currently effectedmanually. Manufacturers of such devices may provide a library, orequivalent, of function definitions from which a user can select adesired functionality. These functions would then be loaded into thedata storage 106 for use according to the invention.

[0185] The generator 100 generates code. That code will be compiled intoobject code when run on a particular platform and the same code maytherefore produce different behaviour in different systems.

[0186] Many modifications and variations fall within the scope of theinvention, which is intended to cover all permutations and combinationsof the generator described herein.

[0187] As will be understood by those skilled in the art, the inventiondescribed above may be embodied in one or more computer programs. Theseprograms can be contained on various transmission and/or storage mediumssuch as a floppy disc, CD-ROM, or magnetic tape so that the programs canbe loaded onto one or more general purpose computers or could bedownloaded over a computer network using a suitable transmission medium.

[0188] Unless the context clearly requires otherwise, throughout thedescription and the claims, the words “comprise”, “comprising” and thelike are to be construed in an inclusive as opposed to an exclusive orexhaustive sense; that is to say, in the sense of “including, but notlimited to”.

1. A method of automatically generating software from one or morepredefined functions in accordance with an input statement entered innatural language, the method comprising the steps of: (i) analysing theinput statement for its semantic content, so as to extract firstsemantically meaningful elements from the input statement; (ii)analysing the one or more predefined functions for their semanticcontent, so as to extract one or more sets of second semanticallymeaningful elements from the one or more predefined functions; (iii)identifying at least one of a condition, an action and/or a statement inthe input statement; (iv) comparing the first semantically meaningfulelements with the second semantically meaningful elements so as toidentify one or more predefined functions that correspond to one or moreaction and/or statement of the input statement; (v) combining at leastsome of the first semantic elements in accordance with any conditionsidentified at step (iii) so as to generate corresponding conditionvariables; (vi) combining functions and condition variables identifiedat steps (iv) and (v) according to a set of predetermined rules in orderto generate the software.
 2. A method according to claim 1, in which thecomparing step (iv) includes the steps of: a) inputting a set of secondsemantically meaningful elements into a predetermined rule; b) inputtingthe first semantically meaningful elements into the rule; c) processingthe rule; d) evaluating the result of step (c); and e) repeating steps(a) to (c) for different sets of second semantically meaningful elementsuntil a solution is evaluated at step (d).
 3. A method according toclaim 1 or claim 2, in which the step (v) of combining at least some ofthe first semantic elements to form condition variables includes thesteps, for each condition identified at step (iii), of: a) identifyingan event governing the condition; b) identifying an entity associatedwith the event; c) identifying a relation between the event and theentity; and d) concatenating the event, the entity and the relationtherebetween, thereby forming a condition variable.
 4. A methodaccording to any one of the preceding claims, in which the functions andcondition variables identified at steps (iv) and (v) are insertedrespectively into a predetermined code template.
 5. Apparatus forautomatically generating software from one or more predefined functionsin accordance with an input statement entered in natural language, theapparatus comprising: (i) extracting means for extracting firstsemantically meaningful elements from the input statement and forextracting one or more sets of second semantically meaningful elementsfrom the one or more predefined functions; (ii) identifying means foridentifying any or all of conditions, actions and/or statements in theinput statement; (iii) comparing means for comparing first semanticallymeaningful elements with second semantically meaningful elements so asto identify one or more predefined functions that correspond to one ormore action and/or statement in the input statement; (iv) firstcombining means for combining at least some of the first semanticelements in accordance with the conditions identified by identifyingmeans (ii) so as to generate corresponding condition variables; (v)second comparing means for combining the condition variables andpredefined functions identified by means (iii) and (iv) according to aset of predetermined rules in order to generate the software. 6.Apparatus according to claim 5, wherein the extracting means is anatural language parser.
 7. Apparatus according to any one of claims 5or 6, including a predetermined code template.
 8. Apparatus according toany one of claims 5 to 7, wherein the identifying means identifiesconditions, actions and statements in accordance with the occurrence ofone or more predetermined semantic patterns
 9. Apparatus according toany one of claims 5 to 8, wherein the software is used in the control ofterminal devices in a communications system.
 10. Apparatus forgenerating code instructions for controlling equipment to carry out oneor more activities, said apparatus comprising a) an input for receivinginstructions for use in controlling the equipment; b) receivedinstruction processing means for extracting one or more operations andone or more logical operators from received instructions; c) means forstoring a plurality of code instructions for controlling the equipmentto carry out activities; d) means for processing stored codeinstructions so as to identify a code instruction relevant to anoperation extracted from a received instruction; e) means fortranslating extracted logical operators into code; and f) codegenerating means for selecting at least one identified code instructionand combining it with at least one translated logical operator togenerate said code instructions for controlling the equipment to carryout the one or more activities.
 11. Apparatus according to claim 10wherein said received instructions comprise natural language. 12.Apparatus according to claim 11 wherein the received instructionprocessing means comprises a parser.
 13. Apparatus according to any oneof claims 10 or 11, 12 wherein the received instruction processing meansalso provides the means for processing stored code instructions.